Skip to content

fix(das): deep-copy worker failed map in getState to avoid stats race#5057

Open
neyy91 wants to merge 1 commit into
celestiaorg:mainfrom
neyy91:fix/das-stats-map-race
Open

fix(das): deep-copy worker failed map in getState to avoid stats race#5057
neyy91 wants to merge 1 commit into
celestiaorg:mainfrom
neyy91:fix/das-stats-map-race

Conversation

@neyy91

@neyy91 neyy91 commented Jun 13, 2026

Copy link
Copy Markdown

Overview

worker.getState() returned workerState by value, but the embedded result.failed map was shared by reference with the running worker. The coordinator's unsafeStats iterates that map (via getState) without holding the worker lock, while the worker keeps writing to it under the lock during sampling. Concurrent map iteration and map write is a fatal runtime error, so polling SamplingStats or the periodic background checkpoint could crash a sampling node.

Fix: deep-copy the map under the lock so each snapshot is independent. Pausing the coordinator (the existing stats() mechanism) does not pause workers, so the snapshot must not alias worker-owned state.

Adds a regression test reproducing the concurrent read/write; it fails under -race before the fix.

Closes #5056

getState returned workerState by value, but the embedded result's `failed`
map was shared by reference with the running worker. The coordinator's
unsafeStats path iterates that map (via getState) without holding the worker
lock, while the worker keeps writing to it under the lock during sampling.
Concurrent map iteration and map write is a fatal runtime error, so polling
SamplingStats or the periodic checkpoint could crash a node that is sampling.

Deep-copy the map under the lock so each snapshot is independent. Adds a
regression test reproducing the concurrent read/write (fails under -race
before the fix).
@neyy91 neyy91 requested a review from a team as a code owner June 13, 2026 11:27
@neyy91 neyy91 requested a review from evan-forbes June 13, 2026 11:27
@github-actions github-actions Bot added the external Issues created by non node team members label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external Issues created by non node team members

Projects

None yet

Development

Successfully merging this pull request may close these issues.

das: Deep-copy worker failed map in getState to fix SamplingStats race

1 participant